Skip to content

✨ Add support for installing bundles with webhooks #1914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

perdasilva
Copy link
Contributor

@perdasilva perdasilva commented Apr 11, 2025

Description

Depends on #1893

Adds webhook support to bundle renderer with certificate management. Based on Joe's PoC #1506

Only the cert-manager provisioner is implemented in this PR. In a follow up, we'll add the openshift-serviceca provisioner and a way to configure OLM for one or the other.

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 11, 2025
Copy link

netlify bot commented Apr 11, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 7d0dce0
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/681890eeae7f3a0008bf4bfe
😎 Deploy Preview https://deploy-preview-1914--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@perdasilva perdasilva changed the title Add wh support ✨ Add support for bundles with webhooks Apr 11, 2025
@perdasilva perdasilva changed the title ✨ Add support for bundles with webhooks ✨ Add support for installing bundles with webhooks Apr 11, 2025
Copy link

codecov bot commented Apr 11, 2025

Codecov Report

Attention: Patch coverage is 90.34608% with 53 lines in your changes missing coverage. Please review.

Project coverage is 68.51%. Comparing base (061da6a) to head (7d0dce0).

Files with missing lines Patch % Lines
...-controller/rukpak/render/generators/generators.go 87.83% 17 Missing and 10 partials ⚠️
...operator-controller/rukpak/util/testing/testing.go 0.00% 14 Missing ⚠️
...troller/rukpak/render/certproviders/certmanager.go 88.00% 4 Missing and 5 partials ⚠️
internal/operator-controller/rukpak/util/util.go 85.71% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1914      +/-   ##
==========================================
+ Coverage   66.81%   68.51%   +1.70%     
==========================================
  Files          75       76       +1     
  Lines        6335     6836     +501     
==========================================
+ Hits         4233     4684     +451     
- Misses       1839     1873      +34     
- Partials      263      279      +16     
Flag Coverage Δ
e2e 42.79% <18.69%> (-2.49%) ⬇️
unit 58.80% <89.98%> (+2.43%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 17, 2025
@perdasilva perdasilva marked this pull request as ready for review April 28, 2025 13:12
@perdasilva perdasilva requested a review from a team as a code owner April 28, 2025 13:12
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 28, 2025
@openshift-ci openshift-ci bot requested review from grokspawn and OchiengEd April 28, 2025 13:13
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 28, 2025
APIVersion: admissionregistrationv1.SchemeGroupVersion.String(),
},
ObjectMeta: metav1.ObjectMeta{
GenerateName: fmt.Sprintf("%s-", generateName),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using GenerateName is what OLMv0 does, correct? I think it does this so that multiple validating/mutating webhook configuration objects can be created when own/single namespace bundles are installed multiple times.

That is an anti-goal for OLMv1, so perhaps it makes sense to deviate from OLMv0's use of GenerateName in this case.

Unless there is a really strong reason not to, I'd prefer to use a deterministic name here instead, as that would ultimately make the set of on-cluster objects deterministic as well. I'm sure there are many benefits, but one that comes to mind is that ClusterExtension SAs would be able to specify the deterministic name in advance when setting up RBAC for get/update/patch/delete verbs with the correct resourceName.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another benefit: using a deterministic name here means that this webhook can only be installed once, regardless of the install modes supported by the extension. That is directly in line with one of the stated invariants of OLMv1 (no two cluster extensions can manage the same object)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to use Name instead. I'm trying to think if there's foot gun potential. I might be a little more defensive here and ensure that any - suffix is removed from what is defined in the CSV to avoid any bad name errors.

We might already, but I'll double check and ensure that all webhook names defined in the CSV (for a particular type of webhook) are unique.

I guess if there is a naming clash, i.e. two different bundles call their webhook "my-webhook", the user can just install the bundle in another namespace as an escape hatch...wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, now I understand better the @joelanford recommendation/idea: https://redhat-internal.slack.com/archives/C0881N26CGP/p1745960594662879?thread_ts=1745942335.360109&cid=C0881N26CGP

I am OK with since in OLMv1 we should have one instance of each project only the cluster

👍

@@ -17,6 +19,21 @@ func ObjectNameForBaseAndSuffix(base string, suffix string) string {
return fmt.Sprintf("%s-%s", base, suffix)
}

func ToUnstructured(obj client.Object) (*unstructured.Unstructured, error) {
gvk := obj.GetObjectKind().GroupVersionKind()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A client.Object is not guaranteed to have a populated GVK. If we need guarantees about setting the GVK on the unstructured type, we should also pass in a scheme so that we can lookup the GVK based on the object type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I've just hardened it by checking if version and kind are empty. I don't think we need to add the complexity of passing a scheme. At least not for now =D

Copy link
Contributor

@camilamacedo86 camilamacedo86 Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want the GKV and use it is the right approach then, I think we should use the controller-runtime implementation to do so ( GVKForObject ): https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/client/apiutil/apimachinery.go#L95C6-L95C18

Copy link
Contributor Author

@perdasilva perdasilva Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need it right now. The precondition for the function is that the client.Object will carry the GVK. If not, then error. But, I'd say check how we're using it in this PR and let me know if that's not the right way to create an object with GVK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that's acceptable since we only use it with CertManager? That way, we'll have the GKV. Is that what you were thinking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't get what it buys us apart from allowing the function to take client.Objects that don't have a gvk set.

I guess my question is, if we do:

ToUnstructured(&certmanagerv1.Certificate{
		TypeMeta: metav1.TypeMeta{
			APIVersion: certmanagerv1.SchemeGroupVersion.String(),
			Kind:       "Certificate",
		}
		...
)

do we get anything out of adding the scheme? Or would adding it just enable us to do something like:

ToUnstructured(&certmanagerv1.Certificate{}, scheme)

or take arbitrary client.Objects at the cost of creating a new scheme and registering the certmanagerv1 scheme to it?

Since we don't have a need for handling arbitrary client.Objects, is there any point in making the method more complex right now?

@@ -13,6 +13,7 @@ const (
// Ex: SomeFeature featuregate.Feature = "SomeFeature"
PreflightPermissions featuregate.Feature = "PreflightPermissions"
SingleOwnNamespaceInstallSupport featuregate.Feature = "SingleOwnNamespaceInstallSupport"
WebhookSupport featuregate.Feature = "WebhookSupport"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to add support for new providers later, how do you envision that looking from a feature gate standpoint?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, downstream will use a different provider, but the implementation remains the same.

I don’t see a strong need to support multiple providers upstream for now — downstream will likely stick with one as well. Since this is under an alpha flag, we have flexibility to adjust later.

So I think this is fine as-is and no need to optimize prematurely. We can revisit if we decide to support more providers in the future.

joelanford
joelanford previously approved these changes Apr 29, 2025
Copy link
Member

@joelanford joelanford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! My only open question (that I already left in a code comment) is how we would introduce more certificate providers in the future?

Presumably we'd want a feature gate per provider, so that each provider could mature independently? If so, should we have CertManagerWebhookSupport for the current feature gate name?

I don't think we have to answer that question or change what we've got in the PR right now, so approving!

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2025
@perdasilva perdasilva enabled auto-merge April 30, 2025 08:46
if _, ok := webhookNameSetByType[wh.Type]; !ok {
webhookNameSetByType[wh.Type] = sets.Set[string]{}
}
if webhookNameSetByType[wh.Type].Has(wh.GenerateName) {
Copy link
Contributor

@camilamacedo86 camilamacedo86 Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @perdasilva

we would need to do more checks than that:

  • The name cannot have more than : 253-character limit
  • The name need to be a valid DNS name;

We can use apimachinery to verify if is valid with: https://pkg.go.dev/k8s.io/apimachinery/pkg/util/validation#IsDNS1123Subdomain

https://github.com/kubernetes/apimachinery/blob/954960919938450fb6d06065f4bf92855dda73fd/pkg/util/validation/validation.go#L216-L229

So, it seems to be missing in the validator.
Could we please add this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add it. But, I think a better approach would be (if it's possible) to validate the csv against its openapi schema. But I don't know that there's a simple way to do that =S

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what did we do in operator-sdk? Do we have any bundle validation code/library we can rip out from that?

Copy link
Contributor

@camilamacedo86 camilamacedo86 Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add it. But, I think a better approach would be (if it's possible) to validate the csv against its openapi schema. But I don't know that there's a simple way to do that =S

That is definitely a nice idea. 👍

We’d need to try using the markers for validation: https://book.kubebuilder.io/reference/markers/crd-validation
Add the annotations and re-generate the CRDs where the types are defined.

I think it would be something like

// +kubebuilder:validation:Pattern=`^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$`
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:MinLength=1

what did we do in operator-sdk? Do we have any bundle validation code/library we can rip out from that?

I think we should consider checking this at runtime, since currently it isn’t validated. But, maybe it's a bit of an overstatement, though—if the name is invalid, the same check used by k8s.io/apimachinery/pkg/util/validation (IsDNS1123Subdomain) will be enforced by Kubernetes itself.

In the past, we used IsDNS1123Subdomain for scaffolding in the SDK (see: https://github.com/operator-framework/operator-sdk/pull/953/files). The SDK uses Kubebuilder, and in Kubebuilder we also rely on the same validation logic (we just copied it to avoid adding a dependency). See; https://github.com/search?q=repo%3Akubernetes-sigs%2Fkubebuilder%20IsDNS1123Subdomain&type=code

If we decide to add the check then, IMO the best approach is to use the checks from k8s.io/apimachinery/pkg/util/validation instead any internal lib.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the validation functions that use the IsDNS1123Subdomain under the hood. I'm also cooking up a scheme where we generate a validator from based on an imported ClusterServiceVersion CRD. This will give us the ultimate security that the CSV is valid (though additional validators around referential integrity, uniqueness, etc. will still be required)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this validator will come in a follow up PR

camilamacedo86

This comment was marked as off-topic.

@@ -0,0 +1,122 @@
package render_test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just add an e2e test before merging, or at plan to do so in a follow-up?

We might be able to reuse something from here:
https://github.com/search?q=repo%3Ak8s-operatorhub%2Fcommunity-operators%20generateName&type=code
(Since we’re using real operators) — this could help validate that the webhooks are working correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets do this as a follow up since we have some work we need to do before we can enable e2es for FGs

camilamacedo86
camilamacedo86 previously approved these changes May 1, 2025
Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@perdasilva

Everything looks great to me! 👍
Awesome work 🥇

Just one thought: I think it would be valuable to have some e2e tests covering install/upgrade scenarios with webhooks.

Do you think we could add those before merging? Or, since the feature is behind a flag, maybe we could make sure to cover that in a follow-up?

Other small nits (nice-to-haves): #1914 (comment)

Aside from that, I'm all good with it.

/lgtm
/approved

@perdasilva perdasilva added this pull request to the merge queue May 1, 2025
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 1, 2025
@camilamacedo86 camilamacedo86 removed this pull request from the merge queue due to a manual request May 1, 2025
@camilamacedo86 camilamacedo86 requested a review from joelanford May 1, 2025 11:24
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 5, 2025
Copy link

openshift-ci bot commented May 5, 2025

New changes are detected. LGTM label has been removed.

Per Goncalves da Silva added 6 commits May 5, 2025 12:09
Signed-off-by: Per Goncalves da Silva <[email protected]>
Signed-off-by: Per Goncalves da Silva <[email protected]>
Signed-off-by: Per Goncalves da Silva <[email protected]>
Signed-off-by: Per Goncalves da Silva <[email protected]>
Signed-off-by: Per Goncalves da Silva <[email protected]>
Signed-off-by: Per Goncalves da Silva <[email protected]>
@perdasilva
Copy link
Contributor Author

Just one thought: I think it would be valuable to have some e2e tests covering install/upgrade scenarios with webhooks.

Do you think we could add those before merging? Or, since the feature is behind a flag, maybe we could make sure to cover that in a follow-up?

I think this falls under the discussion around having e2e tests for feature-gated code. I don't know that we've landed on a solid answer yet, and I think we may want a separate e2e suite (maybe?) for FGs? Because of this, I think it will come in a follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants